# Replication Package: Demographic Structure and International Capital Flows --- Evidence from 140 Countries

## Overview

This package reproduces all results in "Demographic Structure and International Capital Flows: Evidence from 140 Countries." This is the followup paper expanding the original 69-country analysis to 140 countries covering 97% of world population and 99% of world GDP.

## System Requirements

- **Python**: 3.11 or later
- **RAM**: 8 GB minimum
- **Disk**: 2 GB for raw data, 500 MB for processed data and outputs
- **OS**: Linux, macOS, or Windows (WSL tested)
- **Internet**: Required for initial data download

## Quick Start

```bash
# 1. Install dependencies
pip install -r requirements.txt

# 2. Run the full pipeline (downloads data, processes, estimates, generates outputs)
python run_pipeline.py --step all --fred-key YOUR_FRED_API_KEY

# 3. Run analysis scripts (after pipeline completes)
python scripts/phase4cd_interactions_pension.py
python scripts/phase4e_joint_tests_ge_decomp.py
python scripts/phase4f_horserace_jackknife.py
python scripts/phase4g_multiple_testing_cca.py
python scripts/phase4h_cca_event_study.py
python scripts/phase5bc_projections_ge.py

# 4. Generate paper figures
python scripts/generate_paper_figures.py

# 5. (Optional) Generate publication tables
python -c "from src.paper_tables import generate_all_tables; generate_all_tables()"
```

## FRED API Key

A free API key from the Federal Reserve is required for interest rate data. Register at:
https://fred.stlouisfed.org/docs/api/api_key.html

Pass the key via the `--fred-key` command-line argument. If omitted, the pipeline will skip FRED data and the extended model will use IMF rates only.

## Pipeline Steps

The pipeline runs 8 sequential steps. Each step caches its output, so re-runs skip completed steps unless `--force` is specified.

| Step | Description | Output |
|------|-------------|--------|
| `download` | Fetch raw data from 10 external sources | `data/raw/*.csv`, `*.xlsx`, `*.xls` |
| `demographics` | Process UN WPP into age shares and polynomial variables | `data/processed/demographic_shares.csv`, `demographic_polynomials.csv` |
| `macro` | Assemble EBA control variable panel (expanded sample) | `data/processed/macro_panel.csv` |
| `rates` | Build interest rate panel from FRED + IMF | `data/processed/interest_rate_panel.csv` |
| `merge` | Merge all sources; winsorize fiscal balance; log-transform lending rate | `data/processed/full_panel.csv` |
| `estimate` | Run 3 model specifications on original-69, expanded, and 140-country samples | `output/tables/regression_*.csv`, `model_comparison*.csv` |
| `scenarios` | Projections through 2060, counterfactuals, residual decomposition | `output/tables/projection_table*.csv` |
| `visualize` | Generate figures and summary tables | `output/figures/*.png`, `output/tables/*.csv` |

Run individual steps with: `python run_pipeline.py --step download`

Run on the original 69-country sample for comparison: `python run_pipeline.py --original`

## Analysis Scripts

After the pipeline completes, the analysis scripts in `scripts/` produce additional results reported in the paper:

| Script | Description |
|--------|-------------|
| `phase4cd_interactions_pension.py` | KAOPEN interactions, three-way interactions, pension model tests |
| `phase4e_joint_tests_ge_decomp.py` | Joint F-tests, savings-investment decomposition, GE clearing |
| `phase4f_horserace_jackknife.py` | KAOPEN vs. GDP horse race, leave-one-region-out jackknife |
| `phase4g_multiple_testing_cca.py` | Multiple testing corrections (Bonferroni/BH), CCA decomposition |
| `phase4h_cca_event_study.py` | CCA event study (pre/post capital account opening) |
| `phase5bc_projections_ge.py` | 140-country projections through 2060, GE clearing overlay |
| `phase6_revision_robustness.py` | Remittance robustness, CCA age profiles, jackknife ranges, PE vs GE |
| `generate_paper_figures.py` | Generate all 4 paper figures |

## Verifying Results

After running the full pipeline and all analysis scripts, verify against the reference outputs in `output/`:

### Key Statistics to Check (140-Country Sample)

| Metric | Expected Value |
|--------|---------------|
| Countries in baseline sample | 137 |
| Baseline model N obs | 2,730 |
| Baseline model R-squared | 0.273 |
| Baseline Z_1 p-value | < 0.001 |
| Baseline Z_2 p-value | < 0.001 |
| Baseline Z_3 p-value | < 0.001 |
| Baseline fiscal_bal_gdp coefficient | 0.307 |
| Baseline AR(1) rho | 0.811 |
| Extended model N obs | 1,626 |
| Extended model N countries | 90 |
| Extended model R-squared | 0.290 |
| Z_1 x KAOPEN p-value | 0.039 |
| Z_2 x KAOPEN p-value | 0.021 |
| Z_3 x KAOPEN p-value | 0.013 |
| KAOPEN interactions joint F-test | p < 0.001 |
| Demographics-only model N obs | 5,323 |
| Demographics-only model N countries | 141 |

### Key Statistics to Check (Original 69-Country Comparison)

| Metric | Expected Value |
|--------|---------------|
| Baseline model R-squared | 0.309 |
| Baseline model N obs | 1,857 |
| Baseline model N countries | 67 |

### Output File Inventory

**Tables** (`output/tables/`) --- 42 CSV files:

| File | Description |
|------|-------------|
| `regression_demographics_only.csv` | Model 1 coefficients (expanded sample) |
| `regression_demographics_only_140.csv` | Model 1 coefficients (140-country sample) |
| `regression_demographics_only_original69.csv` | Model 1 coefficients (original 69 countries) |
| `regression_baseline_demo_plus_eba.csv` | Model 2 coefficients (expanded sample) |
| `regression_baseline_demo_plus_eba_140.csv` | Model 2 coefficients (140-country sample) |
| `regression_baseline_demo_plus_eba_original69.csv` | Model 2 coefficients (original 69 countries) |
| `regression_extended_plus_rates.csv` | Model 3 coefficients (expanded sample) |
| `regression_extended_plus_rates_140.csv` | Model 3 coefficients (140-country sample) |
| `regression_extended_plus_rates_original69.csv` | Model 3 coefficients (original 69 countries) |
| `model_comparison.csv` | R-squared, N, rho across models (expanded) |
| `model_comparison_140.csv` | Model comparison (140 countries) |
| `model_comparison_original69.csv` | Model comparison (original 69) |
| `master_comparison_69_108_140.csv` | Three-way sample comparison |
| `projection_table.csv` | Demographic contributions (expanded) |
| `projection_table_140.csv` | Demographic contributions (140 countries) |
| `demographic_contributions.csv` | Country-level demographic contributions |
| `demographic_contributions_140.csv` | 140-country demographic contributions |
| `inflection_points_140.csv` | Demographic window timing (140 countries) |
| `expansion_inflection_points.csv` | Inflection points for expansion sample |
| `joint_f_test_interactions.csv` | Joint F-test for KAOPEN interactions |
| `kaopen_income_joint_tests.csv` | KAOPEN x income three-way interaction tests |
| `kaopen_vs_gdp_horserace.csv` | Horse race: KAOPEN vs. income interactions |
| `three_way_interactions.csv` | Three-way interaction model results |
| `subsample_interaction_comparison.csv` | Interaction effects by subsample |
| `pension_model_tests.csv` | Pension system interaction results |
| `savings_investment_channel_test.csv` | Savings vs. investment decomposition |
| `nonlinearity_tests.csv` | NFA and life expectancy nonlinearity tests |
| `nonlinearity_tests_original69.csv` | Nonlinearity tests (original sample) |
| `jackknife_baseline.csv` | Leave-one-region-out jackknife (baseline) |
| `jackknife_extended.csv` | Leave-one-region-out jackknife (extended) |
| `rolling_window_demo_signal_original_69.csv` | Rolling window coefficients (69 countries) |
| `rolling_window_demo_signal_expanded_108.csv` | Rolling window coefficients (expanded) |
| `multiple_testing_correction.csv` | Bonferroni and BH corrections |
| `cca_decomposition.csv` | CCA group decomposition |
| `cca_event_study.csv` | CCA event study results |
| `ge_clearing_rates_140.csv` | GE equilibrium rate adjustments |
| `ge_clearing_projections_140.csv` | GE-adjusted projections |
| `missing_countries_audit.csv` | Data coverage audit |
| `remittance_robustness.csv` | Remittance control test (6 model specifications) |
| `cca_age_profile_contrast.csv` | CCA vs rest-of-world demographic profiles |
| `jackknife_coefficient_ranges.csv` | Conservative vs full-sample coefficient ranges |
| `pe_vs_ge_projections.csv` | Side-by-side PE and GE projections for 15 countries |

**Paper Figures** (`paper/figures/`) --- 4 PNG files:

| File | Description |
|------|-------------|
| `fig1_age_coefficients.png` | Implied age-group coefficients across samples |
| `fig3_rolling_coefficients.png` | Rolling-window demographic signal |
| `fig6_projections.png` | 140-country projections through 2060 |
| `fig7_model_comparison.png` | Cross-sample model comparison |

## Source Code Structure

### Pipeline (`src/`)

| File | Purpose |
|------|---------|
| `src/download.py` | Data acquisition from APIs and URLs |
| `src/demographics.py` | UN WPP processing, polynomial construction |
| `src/macro.py` | EBA control variables, expanded country sample definitions |
| `src/interest_rates.py` | Rate differentials, carry trade variables |
| `src/model.py` | PanelGLS class, model estimation |
| `src/scenarios.py` | Projections, counterfactuals, GE clearing |
| `src/visualize.py` | Figure and table generation |
| `src/visualize_breaks.py` | Structural break visualizations |
| `src/structural_breaks.py` | Rolling-window and break test estimation |
| `src/paper_tables.py` | Publication-formatted tables |

### Analysis Scripts (`scripts/`)

The 7 scripts in `scripts/` implement the phased analysis plan documented in the paper. They read from `output/tables/` and `data/processed/` and write additional results back to `output/tables/`. Scripts are numbered by phase and should be run in order after the pipeline completes.

## Variable Transformations

The merge step applies two transformations to improve model specification:

1. **Fiscal balance/GDP winsorization**: Clipped at 1st and 99th percentiles (approximately [-20.2, 20.1]) to limit the influence of extreme values from hyperinflation and conflict episodes. Without winsorization, the fiscal coefficient is attenuated.

2. **Log lending rate**: Raw lending rates range from 0% to 99,765% due to hyperinflation episodes. The variable is transformed as log(1 + rate/100), mapping to continuously compounded rates.

## Key Differences from Original Paper

| Aspect | Original Paper | This Paper |
|--------|---------------|------------|
| Country coverage | 69 (EBA-49 + 20 SSA) | 140 countries |
| Population coverage | ~80% of world | 97% of world |
| Baseline N obs | 1,857 | 2,730 |
| Z_1 significance | p = 0.097 (marginal) | p < 0.001 |
| KAOPEN interactions | All p < 0.005 | Developing-only (all p > 0.76 for AEs) |
| Pension interactions | Not tested | Significant (p = 0.038) |
| Analysis scripts | Integrated in pipeline | Separate phased scripts |

## Known Issues

1. **PWT download timeouts**: The primary PWT host (`dataverse.nl`) frequently times out. The pipeline includes a fallback URL (`rug.nl`). If both fail, manually download PWT 10.0 and place it in `data/raw/`.

2. **UN WPP URL changes**: The UN periodically restructures its WPP download URLs. If the download step fails for UN data, check `https://population.un.org/wpp/` for the current bulk download location.

3. **IMF database restructuring**: The IMF renamed the `IFS` database to `MFS_IR` for interest rate data. The code uses `MFS_IR`. If this changes again, update the `database_id` parameter in `src/download.py`.

4. **WSL memory**: On Windows Subsystem for Linux with default memory limits (< 8 GB), the pipeline may require running steps individually rather than all at once.

5. **Path dependencies**: The pipeline assumes it is run from the `followup/replication/` directory. The `run_pipeline.py` script uses `Path(__file__).parent` to resolve paths, so it should work regardless of the current working directory.

## Citation

If you use this code or data in your research, please cite:

> "Demographic Structure and International Capital Flows: Evidence from 140 Countries." Working Paper, February 2026.
